NaCTeM at SemEval-2016 Task 1: Inferring sentence-level semantic similarity from an ensemble of complementary lexical and sentence-level features
نویسندگان
چکیده
We present a description of the system submitted to the Semantic Textual Similarity (STS) shared task at SemEval 2016. The task is to assess the degree to which two sentences carry the same meaning. We have designed two different methods to automatically compute a similarity score between sentences. The first method combines a variety of semantic similarity measures as features in a machine learning model. In our second approach, we employ training data from the Interpretable Similarity subtask to create a combined wordsimilarity measure and assess the importance of both aligned and unaligned words. Finally, we combine the two methods into a single hybrid model. Our best-performing run attains a score of 0.7732 on the 2015 STS evaluation data and 0.7488 on the 2016 STS evaluation data.
منابع مشابه
DeepPurple: Estimating Sentence Semantic Similarity using N-gram Regression Models and Web Snippets
We estimate the semantic similarity between two sentences using regression models with features: 1) n-gram hit rates (lexical matches) between sentences, 2) lexical semantic similarity between non-matching words, and 3) sentence length. Lexical semantic similarity is computed via co-occurrence counts on a corpus harvested from the web using a modified mutual information metric. State-of-the-art...
متن کاملNORMAS at SemEval-2016 Task 1: SEMSIM: A Multi-Feature Approach to Semantic Text Similarity
This paper presents the submission of our team (NORMAS) to the SemEval 2016 semantic textual similarity (STS) shared task. We submitted three system runs, each using a set of 36 features extracted from the training set. The runs explore the use of the following three machine learning algorithms: Support Vector Regression, Elastic Net and Random Forest. Each run was trained using sentence pairs ...
متن کاملUdL at SemEval-2017 Task 1: Semantic Textual Similarity Estimation of English Sentence Pairs Using Regression Model over Pairwise Features
This paper describes the model UdL we proposed to solve the semantic textual similarity task of SemEval 2017 workshop. The track we participated in was estimating the semantics relatedness of a given set of sentence pairs in English. The best run out of three submitted runs of our model achieved a Pearson correlation score of 0.8004 compared to a hidden human annotation of 250 pairs. We used ra...
متن کاملSERGIOJIMENEZ at SemEval-2016 Task 1: Effectively Combining Paraphrase Database, String Matching, WordNet, and Word Embedding for Semantic Textual Similarity
In this paper, a system for semantic textual similarity, which participated in Task1 in SemEval 2016 (monolingual and crosslingual sub-tasks) is described. The system contains a preprocessing step that simplifies text using PPDB 2.0 and detects negations. Also, six lexical similarity functions were constructed using string matching, word embedding and synonyms-antonyms relations in WordNet. The...
متن کاملRev at SemEval-2016 Task 2: Aligning Chunks by Lexical, Part of Speech and Semantic Equivalence
We present the description of our submission to SemEval-2016 Task 2, for the sub-task of aligning pre-annotated chunks between sentence pairs and providing similarity and relatedness labels for the alignment. The objective of the task is to provide interpretable semantic textual similarity assessments by adding an explanatory layer to aligned chunks. We analysed the provided datasets, consideri...
متن کامل